feat(ascend): add top-k top-p sampler by zhangyue207 · Pull Request #603 · InfiniTensor/InfiniOps

zhangyue207 · 2026-05-12T11:38:18Z

Summary

Add the public TopKTopPSampler base API in src/base/top_k_top_p_sampler.h.
Add an Ascend ACLNN implementation in src/native/ascend/ops/top_k_top_p_sampler/kernel.h.
Add focused Ascend tests in tests/test_top_k_top_p_sampler.py.
Keep the branch rebased onto current upstream master (d67759e).

Motivation

N/A. No issue is linked.

This adds a logits-level Ascend sampler boundary needed by inference pipelines that already prepare top-k and top-p controls outside InfiniOps.

Type of Change

feat - new feature / new operator / new platform
fix - bug fix
perf - performance improvement (no behavioral change)
refactor - code restructuring without behavior change
test - adding or fixing tests only
docs - documentation only
build / ci - build system or CI configuration
chore - tooling, formatting, or other non-code changes
Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

Test Results on Supported Platforms

Platform	Built	`pytest` Result	Notes / Hardware
NVIDIA	CI queued	CI queued	Re-run triggered on head `5e36a1d`.
Iluvatar	CI queued	CI queued	Re-run triggered on head `5e36a1d`.
MetaX	CI queued	CI queued	Re-run triggered on head `5e36a1d`.
Cambricon	CI queued	CI queued	Re-run triggered on head `5e36a1d`.
Moore	CI queued	CI queued	Re-run triggered on head `5e36a1d`.
Ascend	CI queued	CI queued	Re-run triggered on head `5e36a1d`.

Local checks

git diff --check upstream/master...HEAD
clang-format --dry-run --Werror src/base/top_k_top_p_sampler.h src/native/ascend/ops/top_k_top_p_sampler/kernel.h
ruff format --check tests/test_top_k_top_p_sampler.py
ruff check tests/test_top_k_top_p_sampler.py

All commands passed locally after rebasing onto upstream/master d67759e.

GitHub CI re-run

Head SHA: 5e36a1d029c8cd3695cdb60bf8190dc5a9f25f22
Base SHA: d67759e0c59de09087a4e8774c8a5f4db7bd5134
Triggered at: 2026-06-30 06:11 UTC
Status at description update: ruff and matrix generation passed; clang-format in progress; platform unit jobs queued.

Benchmark / Performance Impact

N/A. This PR adds a correctness-tested operator wrapper and does not include a performance benchmark.

Notes for Reviewers

API alignment note:

Reference checked: vLLM vllm.v1.sample.ops.topk_topp_sampler.TopKTopPSampler, whose native path receives logits, generators, k, and p, applies top-k/top-p filtering, and samples from logits.
InfiniOps base signature: TopKTopPSampler(logits, k, p, out) and operator()(logits, k, p, out).
Intentional deviations: InfiniOps uses an output tensor instead of returning sampled ids, does not expose Python generators, and does not return processed logits/logprobs. Temperature scaling remains caller-owned.
Current Ascend ACLNN implementation is verified for the deterministic top_k == 1 subset; broader stochastic sampling is left out of this minimal PR.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits.
Branch name follows <type>/xxx-yyyy-zzzz.
Each commit message follows Conventional Commits.
Small PR is a single squashable commit.
No stray merge commits from master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal.
No dead code, commented-out blocks, debug prints, or ownerless TODO entries.
No unrelated formatting churn.
Public API changes are intentional, documented, and reflected in tests.

General Code Hygiene

The code is self-explanatory; comments were added only where the reason is non-obvious.
Every modified or added file ends with a single trailing newline.
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks.
All comments and error messages are in English.
Comments and error messages are complete sentences unless the language/framework convention says otherwise.

C++ Specific

Python Specific

ruff format --check tests/test_top_k_top_p_sampler.py passes.
ruff check tests/test_top_k_top_p_sampler.py passes.
Framework-specific conventions are honored where applicable.
Function body spacing follows the project Python rules.

Testing

GitHub CI re-run was triggered by updating the PR branch.
New functionality has matching tests under tests/.
Tests use pytest.mark.parametrize and pytest.mark.auto_act_and_assert.
Default device parameterization is used.
N/A: No bug fix regression test is needed for this feature PR.

Build, CI, and Tooling

git diff --check passes.
No new runtime dependency was added.
No CMake or CI configuration changes remain in this PR.

Documentation

N/A: README.md / CONTRIBUTING.md changes are not needed for this small operator addition.
New public operator behavior is documented in the base header and tests.
N/A: No user-visible breaking change.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
No third-party code was added.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

gongchensu · 2026-06-30T08:58:37Z

现在看起来好像只支持k=1是吗？

zhangyue207 · 2026-07-02T06:09:47Z

现在看起来好像只支持k=1是吗？

看起来是的，我补充一下

gongchensu · 2026-07-02T08:03:13Z

现在看起来好像只支持k=1是吗？

看起来是的，我补充一下

Ascend 路径现在还是不会真正 sample，只会选 argmax。
在 kernel.h (line 108) 调 aclnnTopKTopPSampleGetWorkspaceSize 时传的是 qOptional=nullptr。在CANN 8.5.1 和 CANN 9.0.0 里看了这个 ACLNN 算子的源码/声明：只有传了 q 才会走 QSampleCompute，否则最后是对 softmax 后的候选集合做 ArgMaxAndGather。
所以现在即使用户传 k=10 或 p=0.9，Ascend 也只是先过滤候选，然后仍然选最高 logit 的 token；不会像 CPU 实现那样随机采样。测试也没覆盖到这个，因为 test_top_k_top_p_sampler.py (line 23) 只测了 k=1，reference 也是 torch.argmax。
这版已经修掉了之前“强制 k=1”的表层问题，但 Ascend sampling 语义还没真正修好。要么需要给 ACLNN 传 [batch, vocab] 的 float32 q 随机/采样权重 tensor，要么把这个 op 的语义明确收窄成 greedy/top-k filtered argmax。

zhangyue207 · 2026-07-02T08:34:13Z

已处理：

Ascend 路径现在会生成并传入 [batch_size, vocab_size] 的 float32 q tensor，不再给 aclnnTopKTopPSample 传 qOptional=nullptr。
q 使用 exponential noise 填充，让 ACLNN 走 QSampleCompute 路径，而不是 filtered argmax。
保留了 k != 1 支持；测试仍只覆盖 k=1 的确定性 case，避免随机采样导致 flaky。
额外在 Ascend 容器里做过 k=2 smoke test，固定 logits 下多次运行能采到候选集合里的两个 token，确认不是恒定 argmax。
最新 CI 已全绿。

zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch 3 times, most recently from 2444a24 to 2e95fa7 Compare May 29, 2026 05:53

zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 2e95fa7 to 611cab2 Compare June 2, 2026 02:40

zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 611cab2 to 5e36a1d Compare June 30, 2026 06:11

zhangyue207 marked this pull request as ready for review July 2, 2026 05:28

zhangyue207 requested a review from a team July 2, 2026 05:28

zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 5e36a1d to 2e56167 Compare July 2, 2026 07:12

feat(ascend): add top-k top-p sampler

dc7fca9

zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 2e56167 to dc7fca9 Compare July 2, 2026 08:15

gongchensu approved these changes Jul 2, 2026

View reviewed changes

voltjia approved these changes Jul 2, 2026

View reviewed changes

voltjia merged commit 70e9d01 into InfiniTensor:master Jul 2, 2026
24 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ascend): add top-k top-p sampler#603

feat(ascend): add top-k top-p sampler#603
voltjia merged 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-top-k-top-p-sampler

zhangyue207 commented May 12, 2026 •

edited

Loading

Uh oh!

gongchensu commented Jun 30, 2026

Uh oh!

zhangyue207 commented Jul 2, 2026

Uh oh!

gongchensu commented Jul 2, 2026

Uh oh!

zhangyue207 commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zhangyue207 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene

C++ Specific

Python Specific

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

gongchensu commented Jun 30, 2026

Uh oh!

zhangyue207 commented Jul 2, 2026

Uh oh!

gongchensu commented Jul 2, 2026

Uh oh!

zhangyue207 commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangyue207 commented May 12, 2026 •

edited

Loading

zhangyue207 commented Jul 2, 2026 •

edited

Loading